fix(resolveGgmlTypeOption): accept case-insensitive ggml type names#607
Open
andreinknv wants to merge 1 commit into
Open
fix(resolveGgmlTypeOption): accept case-insensitive ggml type names#607andreinknv wants to merge 1 commit into
andreinknv wants to merge 1 commit into
Conversation
Previously `resolveGgmlTypeOption("q8_0")` returned `undefined` because
the lookup used `Object.hasOwn(GgmlType, option)` against the
uppercase-by-convention enum keys (`Q8_0`, `F16`, `BF16`, ...). The
caller then fell back to F16 silently — no warning, no error — making
this a frustrating-to-debug "why is my Q8 KV cache acting like FP16"
class of issue.
This change uppercases the input string before the lookup. Numeric
GgmlType enum values are still accepted unchanged. Empty / null /
unrecognized values still return `undefined`.
Affects:
- `experimentalKvCacheKeyType` / `experimentalKvCacheValueType` on
LlamaContextOptions and LlamaModelOptions
- The CLI's `--kv-cache-key-type` / `--kv-cache-value-type` flags
(`resolveCommandGgufPath.ts`)
After:
resolveGgmlTypeOption("q8_0") // => GgmlType.Q8_0
resolveGgmlTypeOption("Q8_0") // => GgmlType.Q8_0 (unchanged)
resolveGgmlTypeOption("bf16") // => GgmlType.BF16
resolveGgmlTypeOption("nope") // => undefined (unchanged)
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
resolveGgmlTypeOption("q8_0")currently returnsundefined. The lookup usesObject.hasOwn(GgmlType, option)against the uppercase-by-convention enum keys (Q8_0,F16,BF16, …), so any lowercase/mixed-case input falls through unrecognized. The caller then quietly defaults to F16 with no warning — a silent, hard-to-debug "why is my Q8 KV cache behaving like FP16?" failure.This PR uppercases the string input before the lookup. Numeric
GgmlTypevalues still resolve unchanged. Invalid strings still returnundefined.Why
Two callsites that consume this string:
LlamaContextOptions.experimentalKvCacheKeyType/experimentalKvCacheValueType— operators set these from config files, where lowercase quant identifiers (q8_0,q4_0,bf16) are common because GGUF filenames carry them in that case (e.g.qwen2.5-coder-7b-instruct-q4_k_m.gguf).CLI flags
--kv-cache-key-type/--kv-cache-value-type— same expectation; users mirror the lowercase GGUF naming.Reproduction (matches what surfaced while benchmarking a quantized-KV configuration): pass
"q8_0"toexperimentalKvCacheKeyType, run a context, observe VRAM/perf numbers that match the F16 baseline instead of Q8. No error is raised — the silent fallback makes it look like the override is working when it isn't.After
Compatibility
undefinednow resolve when their uppercase form is a valid key. This is a strict superset of the prior accepted-input set — no caller that depended onundefinedfor those inputs should exist (the API contract is "valid → enum value, invalid → undefined").Test plan
resolveGgmlTypeOption("q8_0")now returnsGgmlType.Q8_0;"Q8_0"still does."nope","") still returnundefined.🤖 Generated with Claude Code